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CONDITIONAL QUANTILE SEQUENTIAL ESTIMATION FOR 

STOCHASTIC CODES 

T. LABOPIN-RICHARD, F. GAMBOA, AND A. GARIVIER 


Abstract. This paper is devoted to the sequential estimation of a conditional quantile 
in the context of real stochastic codes with vector-valued inputs. Our algorithm is a 
combination of fc-nearest neighbours and of a Robbins-Monro estimator. We discuss 
the convergence of the algorithm under some conditions on the stochastic code. We 
provide non-asymptotic rates of convergence of the mean square error and we discuss 
the tuning of the algorithm’s parameters. 


1. Introduction 

Computer code experiments have encountered, in the last decades, a growing interest 
among statisticians in several fields (see m and references therein and also [I9l|26l[22l 
EHIIS]...)- In the absence of noise, a numerical black box —)• M maps an input 

vector X to Y = g{X) G M. When the black box does include some randomness, the 
code is called stochastic and the model is as follows: a random vector e G MX, called 
random seed, models the stochasticity of the function, while A is a random vector. The 
random seed and the input are assumed to be stochastically independent. The map g 
(which satisfies some regularity assumption specified below) is defined on M'^ x M™" and 
outputs 

(1) Y = g{X,e), 

hence yielding possibly different values for the same input X. One observes a sample of 
[X, Y), without having access to the details of g. However, those observations are often 
expensive (for example when g has a high computational complexity) and one aims at 
learning rapidly some properties of interest on g. 

We focus in this work on the estimation of the conditional quantile of the output Y 
given the input X. For a given level a G [1/2,1) and for every possible input x G 
the target is 

9*{x) := qa{g{x,e)) , x G , 

where qa{Z) := F^^{a) is the quantile of level a of the random variable Z and F^^{u) := 
inf{x : Fz{x) > u} is the generalized inverse of the cumulative distribution function of 
Z. Moreover, we would like to estimate such a quantile for different values of x. 
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1.1. The algorithm. For a fixed value of x, there are several well-known procedures to 
estimate the quantile 9*{x). Given a sample of := g{x,e), the empirical 

quantile is a solution. For a sequential estimation, one may use a Robbins Monro [23] 
estimator. This method permits to iteratively approximate the zero of a function h : 
M —)• M by a sequence of estimators defined by induction: 9q G and for all n > 0, 

9n+l — 9n 'Jn+lH(^9^, . 

Here, ( 7 „) is the learning rate (a deterministic step-size sequence), (Z„) is an i.i.d sample 
of observations, and H is a noisy version of h. Denoting Yn ■= criZi, ■. ■ Zn), H is such 
that 

E{H{9n,Zr,+l)\J^n) = h{9n) • 

Classical conditions for the the choice of the step sizes ( 7 ^) are 

and ^ 7 ^ = 00 . 

n n 

These conditions ensure the convergence of the estimates under weak assumptions. For 
example, convergence in mean square is studied in |23j . almost sure consistency is con¬ 
sidered in [71128], asymptotic rate of convergence are given in [I311S1125], while large 
deviations principles are investigated in m- There has been a recent interest on non- 
asymptotic results. Risk bounds under Gaussian concentration assumption (see m) 
and finite time bounds on the mean square error under strong convexity assumptions 
(see [111128] and references therein), have been given. Quantile estimation corresponds 
to the choice h : t F(t) — a, where F is the cumulative distribution function of the 
target distribution. One can show that the estimator 

r 00 G K 

^ 9n+l = 9n — 7n+l (lz„+i<6»„+i ~ o) • 

is consistent and asymptotically Gaussian (see m chapters 1 and 2 for proofs and 
details). It is important to remind, however, that the lack of strong convexity prevents 
most non-asymptotic results to be applied directly, except when the density is lower- 
bounded. We nevertheless mention that Godichon et al. prove in mm such non- 
asymptotic results for the adaptation of algorithm Q to the case where Z is a random 
variable on an Hilbert space of dimension higher than 2. 

Of course, unless x can take a finite but small number of different values, it is not 
possible to use this algorithm with a sample of for each x. Even more, when the code 
has a high computational complexity, the overall number of observations (all values of x 
included) must remain small, and we need an algorithm using only one limited sample 
{Xi, Fi)i=i,.,n of (X, Y). Then, the problem is more difficult. For each value of x, we need 
to estimate quantile of the conditional distribution given x using a biased sample. To 
address this issue, we propose to embed Algorithm Q into a non-parametric estimation 
procedure. For a fixed input x, the new algorithm only takes into account the pairs 
{Xi, Yi) for which the input Xi is close to x, and thus (presumably) the law of 1) close 
to that of Y^ . To set up this idea, we use the fc-nearest neighbours method, introducing 
the sequential estimator: 
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(3) 


where 


9o{x) £ R 

9n+l{x) = 9n{x) - 7n+l (ly„+i<0„(a;) “ «) lx„+iefcAfAf„+i(x) , 


• kNNn{x) is the set of the kn nearest neighbours of x for the euclidean norm on 
Denoting by ||X —the ith statistic order of a sample {\\Xi —x\\)i=i,,,n 
of size n, we have 


{Xn+l £ kNNn+l{x)} = {||Xn+l -x\\<\\X - x\\(^kn+i,n)} ■ 

In this work, we discuss choices of the form kn = [n^J for 0</3<l,nGN*. 

• ( 7 „) is the deterministic steps sequence. We also study the case 7,1 = n~'^ for 
0 < 7 < 1, n E N*. 

The /c-nearest neighbours method of localization first appears in |29l [30] for the es¬ 
timation of conditional expectations. In [ 6 ], Bhattacharya et al. apply it to the (non¬ 
recursive) estimation of the conditional quantile function for real-valued inputs. If the 
the number of neighbours kn is small, then few observations are used and the estimation 
is highly noisy; on the other hand, if kn is large, then values of Yi may be used that have 
a distribution significantly different from the target. The challenge is thus to tune kn so 
as to reach an optimal balance between bias and variance. 

In this work, this tuning is combined with the choice of the learning rate. The main 
objective of this work is thus to optimize the two parameters of Algorithm Q, i.e. 
the step size 7 ^ and the number of neighbours kn- The paper is organized as follows: 
Section deals with the almost sure convergence of the algorithm. Further, it contains 
the main result of our paper that is a non-asymptotic inequality on the mean square 
error from which an optimal choice of parameters is derived. In Section we present 
some numerical simulations to illustrate our results. The technical points of the proofs 
are differed to Section [S] 


2. Main results 

We explain here how to tune the parameters of the algorithm. We also provide 
conditions allowing theoretical guarantees of convergence. Before that, we start by some 
notation and technical assumptions. 

2.1. Notation and assnmptions. The constants appearing in the sequel are of three 
different types: 

1) (L, U) denote lower- and upper bounds for the support of random variables. 
They are indexed by the names of those variables; 

2 ) (Aj)ieN* are integers denoting the first ranks after which some properties hold; 

3) (CijigN* are positive real numbers used for other purposes. 

Without further precision, constants of type 2) and 3) only depend on the model, that 
is, on g and on the distribution of (e,X). Further, we will denote Ci{u) or Ni{u) for 
u £ T’{{a, X, d}) (the power set of a {a,x,d}) constant depending on the model, on 
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the probability level a, on the point x and on the dimension d. The values of all the 
constants are summarized in the Appendix. 

For any random variable Z, we denote by Fz its cumulative distribution function. We 
denote by Ex the set of the balls of centred at x. For B £ Ex, we denote by its 
radius and for > 0, we call a random variable with distribution C{Y\X £ B). 

Remark 2.1. If the pair {X, Y) has a density f(x,Y) marginal density fx{x) 

is positive, then we can compute the density of C{Y\X = x) by 

JY\X=x — - f ( N- ) 

and when B = {x}, 

yB gyx ^ e) ~ C{Y\X = x) . 

We will make four assumptions. The hrst one is hardly avoidable, since we deal with 
/c-nearest neighbours. The three others are more technical. 

Assumption A1 For all x in the support of X (that we will denote Supp(A) in the 
sequel), there exists a constant M{x) such that the following inequality holds : 

VR G Ex, Vt G M, |Fyi3(t) — FY^{t)\ < M{x)rB ■ 

In words, we assume that the stochastic code is sufficiently smooth. The law of two 
responses corresponding to two different but close inputs are not completely different. 
The assumption is clearly required, since we want to approximate the law C {Y^) by the 
law C{Y\X £ kNNn{x)). 

Remark 2.2. If we consider random vector supported by x M, we can show that 
Assumption A1 holds, for example, as soon as {X,Y) had a regular density. In all 
cases, it is easier to prove this assumption when the couple {X, Y) has a density. See 
Subsection 3.1 for an example. 


Assumption A2 The law of X has a density and this density is lower-bounded by a 
constant Cinput > 0 on Supp(A). 

This hypothesis implies in particular that the law of X has a compact support. Notice 
that this kind of assumptions is usual in fc-nearest neighbours context (see for example 

m)- 

Assumption A3 The code function g takes its values in a compact [Ly, Uy]. 


Under Assumption A3 and if /3 > 7 , then 

\/Cl := max (Uy — Ly -|- (1 — a), Uy -\- a — Ly) = Uy — Ly -|- a , 


is a bound of \9n(x) — 0 *( 3 :)| (see Lemma 5.8 in Appendix). 


Assumption A4 For all x, the law g(x, e) has a density which is lower-bounded by 
a constant Cg(x) > 0 on its support. 
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Lemma 2.1. Denoting C2{x, a) := min \^Cg{x), , j , we have thanks to assump¬ 

tion A4, 

(4) VneN*, [FY49n{x))-FY49*ix))][en{x)-9*{x)]>C2{x,a)[9n{x)-9*{x)]\ 

Proof. When 9 n{x) G [Ly, Uy], it is obvious that the inequality Q holds for C2 ■= Cg{x). 
When 9n{x) G [Lq^^Ly], we have 


LB^<9n{x)<LY <9*[x] , 
and then FY^{9n{x)) = 0. Thus, 

{9n{x) - 9*{x)){FY^{9nix)) - Fy^{9*{x))) = {9n{x) - 9*(x)) 


2 ( 0 -«) 

9n{x) - 9* 


—a 


> C2ix,a){9n{x) - 9*{x)f . 


The proof of the last case follows similarly using that 6*2 (x, a, d) > ■ □ 

This assumption is useful to deal with non-asymptotic inequality for the mean square 
error. It is the substitute of the strong convexity assumption made in |21j which is not 
true in the case of the quantile. 

2.2. Almost sure convergence. The following theorem studies the almost sure con¬ 
vergence of our algorithm. 

Theorem 2.1. Let x be a fixed input. Under Assumptions A1 and A2, Algorithm Q) 
is almost surely convergent whenever \ < 'y < fi < 1. 

Sketch of proof : In the sequel, we still denote Fn := a (Xi,..., Yi,..., Yn) and 
and the conditional expectation and probability given Fn- For sake of simplicity, 
we denote 


H{9n{x),Xn+l,Yn+l) (ly„+i<0„(a;) -«) '^Xn+i(^kNNn+i{x) ■ 

The proof is organized in three steps. 

1) We decompose F[{9n{x), Xn+i,Yn+i) as a sum of a drift and a martingale incre¬ 
ment : 

hn{9n) ■=^{H{9n,Xn+l,Yn+l)\Fn) and H{9n,Xn+l,Yn+l) := hn{9n) + ^n+1 ■ 
Then, 

n 

Tn := 9n{x) + '^-ijhj-i{9j-i{x)) , 
i=i 

is a martingale which is bounded in Lfi. So it converges almost surely. 

2) We show the almost sure convergence of {9n)n ■ 
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a) First, we check that (0„) does not diverge to +00 or — 00 . 

b) Then, we prove that (0„) converges almost surely to a finite limit. 

3) We conclude by identifying the limit : 0*(x), the conditional quantile. 

Steps 2a), 2b) et 3) are shown by contradiction. The key point is that almost surely, 
after a certain rank, hn{0n) > 0. This property is ensured by Assumptions A1 and A2. 
The entire proof is available in Section 5. 


Comments on parameters. In the Theorem 2.1 


we assume that 0 

/? > 0 means that the number of neighbours goes to +00 and then, ||A — x| 
The condition /? < 1 allows to apply the Lemma 5.4 
condition /3 > 7 can be understood in this way. 


< /3 < 1 . 

{kn,n) 0 - 

_ It is a technical condition. The 

When considering Algorithm (§ , we 


deal with the global learning rate 7 n = n~'^. In Algorithm Q, since for a fixed input x, 
there is not an update at each step n, we introduce the effective learning rate ™ the 
following way. At step k, 6k{x) has a probability of /k to be updated. Then, until 
time n, the algorithm is updated a number of times equal to 


N = ^k^ 

k<n 




Thus, there were N = n updates at time . Then, in mean, it is as if the algorithm 
was dehned by 

0k„{x) = 0fc„_i(x) +7fc„ - a) , 

with the learning rate 

_ 1 _ 1 
Tfcn “7 \ 7 T • 

This is a well-known fact that this algorithm has a good behaviour if, and only if, the 
sum 

n n 

is divergent. That is if, and only if /? > 7 . At last, the condition ^ < 7 < 1 is a classical 
assumption on the Robbins Monro algorithm to be consistent (see for example in [23]). 
Here, we restrict the condition to 7 < 1 because we need 1 > /3 > 7 . 


2.3. Rate of convergence of the mean square error. Here, we study the rate of 
converge of the mean square error denoted by an{x) := E ^(0„(a:) — 0*(x))^^. 

Theorem 2.2. Under hypothesis Al, A2, A3 and A4, the mean square error an{x) of 
the algorithm 0 satisfies the following inequality : V( 7 ,/ 3 ,e) such that 0 < 7 < /? < 1 

and 1 > e > 1 — f3, yn > No + 1 where Nq = , 


n 

an{x)<ex.p{- 2 C 2 {x,a){Kn-HNo))C\+ E exp (- 2 C' 2 (x, a) (a„ - k^)) 4 

k=No+l 



+ Cl exp 
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where for j G N*, kj = i 


-e-7 


and 


Z=1 


dn = Cl exp + 2y/^M{x)C3{d)jn ■ 

Sketch of proof : The idea of the proof is to establish the recursive inequality on 
a„(x) (following [21]), that is for n > Nq, 

®n+l(3^) ^ ®n(2^)(l Cjj-|_i) + dji+l 


where for all n G N*, 0 < < 1 and > 0. We conclude by using Lemma 5T In this 

purpose we begin by expanding the square 

{9n+l{x) - 9*{x))'^ = {9n{x) - 9*{x))'^ +7n+l [(1 “ + a^] lx„+iekNN„+i(x) 

- ‘^1n+l{9nix) - 9*{x)) {lY„+i<e„ix) “ «) 1a„+i efcIVAf„+i(x) • 

Taking the expectation conditionally to J^n, and using the Bayes formula, we get 


(5) 


E„ ((0„+i(x) - e*{x)f) < E„ ((0„(cr) - e*{x)f) + 


- 27n+i {e„{x) - e*{x)) P„ - Fy49*{x)) 


where Pn = Tn {Xn+i £ kNNn+i{x)) as in Lemma 


5.1 


and Br, 


(x) is the ball of 


centred in x and of radius ||X — We rewrite this inequality to make appear 

two different errors : 

1) First, the quantity F {9n{x)) — FY^{9n{x)) represents the bias error 

"yBri (a;) 

(made because the sample is biased). Using Al, we get 


and by A3, \9n{x) — 9*{x)\ < \fC\. Thus, 


-2^^Yl{9n{x)-9\x))Pn 


L yS„"^+Lx)' 

< 27 „+iv^M(x)Pn||A - x\\^k„+un) ■ 

2) The second quantity, Pya;(0„(x)) — Fy^{9*{x)) represents the on-line learning 
error (made by using a stochastic algorithm). Thanks to Assumption A4 we get 

{9n - 9*) [FY^{9n{x)) - Fy^{9*{x))] > C 2 (x,a) [9n{x) - 9*{x)f . 

Taking now the expectation of the inequality ,we get 

an+i(2:) < an{x) - 2 jn+iC 2 {x,a)E [(6»„(x) - 9*{x))‘^Pn] + 7 ^+iE(P„+i) 

+ 2'yn+lM{x)^/^E{\\X - x\\(kr^+un)Pn) ■ 


(9nix)) - FY^{9n{x)) 


This inequality reveals a problem : thanks to Lemmas 5.1 and 5.6 (and so thanks to 
assumption A2) we can deal with the two last terms, but we are not able to compute 
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E [(0„(x) — 6*{x))‘^Pn\ ■ To solve this problem, we use a truncated parameter e^. Instead 
of writing a recursive inequality on an{x) we write such inequality with the quantity 


bn{x) := E (Onix) - 9*{x)f lp„ 


><^n 


Choosing e„ = (n + 1) we have to tune an 


other parameter but thanks to A3 and concentration inequalities (see lemma 5.4), it is 
easy to deduce a recursive inequality on an{x) from the one on bn{x), for n > No- 


Comments on the parameters. We choose 0 < /3 < 1 for the same reasons as in 
Theorem | 2 . 1 [ About 7 , the inequality is true on the entire area 0 < 7 < 1 as soon as 
7 < /? (which is unusual, as you can see in |16j for example). We will nevertheless see in 
the sequel that this is not because the inequality is true that an(x) converges to 0. We 
will discuss later good choices for ( 7 ,/?). 


Compromise between the two errors. We can easily see the compromise we have 
to do on f3 to deal with the two previous errors. Indeed, 

• The bias error gives the term 

exp I — 2 C' 2 (x, a)(x) E P+'i ) ’ 

y k=No+i j 

of the inequality. This term decreases to 0 if and only if 7 + e < 1 which implies 
/3 > 7 . Then (5 has to be chosen not too small. 

• The on-line learning error gives the term in the 

remainder. For the remainder to decrease to 0 with the faster rate, we then need 
that fd is as small as possible compared to 1. Then /3 has to be chosen not too 
big. 


From this theorem, we can get the rate of convergence of the mean square error. In 
that purpose, we have to study the order of the remainder dn in n to exhibit dominating 
terms, dn is the sum of three terms. The exponential one is always negligence as soon 
as n is big enough because 1 > e. The two other are powers of n. Comparing their 
exponent, we can find the dominating term in function of 7 and (5. Actually, there exists 
a rank A'i(x, d) and some constants and Ce,{x, d) such that, for n > Nq + 1, 
if /3 < 1 — dy, we get 


if /3 > 1 — d'y, we get 


dn < C^n ^ . 




Copying that in the Theorem |2.2[ we deduce the following result. 

Corollary 2.1. Under assumptions of Theorem \2.S\ there exists ranks N 4 {x,a,d) and 
constants C 7 {x, a, d) and C%{x, a) such that Vn > N 4 {x, a, d), 

when ft > 1 — dj and 1 — (d < e < min (l — 7 , (l + g) (1 — P)), 


an{x) < 


C7(d,x,o,e,7) 

“^+(i+DT-/3) 


n 
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When (3 <1 — d'j, and 1 — e < mm(l — /3 + 7 ,1 — 7 ), 

Cs{x,a) 

dnix) < -STl- • 

^ - „7-/3+l-e 

Remark 2.3. For other values of 7 and f3, the derived inequalities do not imply the 
convergence to 0 of an{x) this is why we do not present them. 


From this corollary we can derive optimal choices for (/ 3 , 7 ), that is parameters for 
which our upper-bound on the mean square error decreases with the fastest rate. 


Corollary 2.2. Under the same assumptions than in Theorem \2.S\ the optimal param¬ 
eters are 7 = and P = '^ + rjp where rjg > 0 is as small as possible. With these 
parameters, there exists a constant Cg{x,a,d) such thatMn > N 4 {x,a,d), 


where 7 = ^ + 7 ^ and 7 ^ 


Unix) < 


Cg{x,a,d) 


1 — (3 — e. 


Comments on the constant Cg(x, a, d). Like all the other constants of this paper, 
we know the explicit expression of Cg{x,a,d). An example of values of this constant is 
given in Subsection 3.1. 

We can notice that the constant Cg{x,a,d) depends on x only through Cg{x) and 
M{x). Nevertheless, often in practice, Cg{x) and M{x) do not really depend on x (see 
for example Subsection 3.1). In these cases (or when we can easily find a bound of Cg{x) 
and M(x) which do not depend on x), our result is uniform in x. Then, it is easy to 
deal with the integrated mean square error and conclude that 

[ an{x)fxix)dx < . 

Jx n^+‘^ ^ 

When a increases to 1, we try to estimate extremal quantile. C 2 {x,a) becomes smaller 
and then Cg{x,a,d) increases. The bound gets worst. We can easily understand this 
phenomenon because when a is big, we have a small probability to sample on the right 
of the quantile, and the algorithm is then less powerful. 

Let us now comment the dependency on the dimension d. The constant Cg{x, d, alpha) 

decreases when the dimension d increases. Nevertheless, this decreasing is too small to 

-1 

balance the behaviour of the rate of convergence which is in . This is an example 
of the curse of dimensionality. 

Comment on the rank N 4 (x, a, d). This rank is the maximum of four ranks. There 
are two kinds of ranks. The ranks (iVj)j^o depend on constants of the problem but are 
reasonably small, because the largest of them is the rank after which exponential terms 
are smaller than power of n terms, or smaller power of n terms are smaller than bigger 
power of n terms. They are then often inferior to Ng in practice. We only need this rank 
to find optimal parameters (and at this stage our reasoning is no more non-asymptotic). 

The rank Ng is completely different. It was introduced in the first theorem because 
we could not deal with an{x) directly. In fact it is the rank after which the deviation 
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inequality, allowing us to use bn{x), is true. It depends on the gap between e and 1 — /3. 
The optimal e to obtain the rate of convergence of the previous corollary is e = 1 — /3 + r/g 
with r/e as small as possible. The constant appears on the rank A^o and also on the 
rate of convergence (let us suppose that = Nq which is the case most of time) 

Vn > A^o = exp (2 t/“^) , an{x) = O . 

Then the smaller is the faster is the rate of convergence, but also the larger is the 
rank after which inequalities are true. 

Let us give an example. For a budget of A^ = 1000 calls to the code, one may choose 
T/e = 0.3 for the inequality to be theoretically true for n = N. The Table gives the 
theoretical precision for different values of d and compares it with the ideal case where 
rje = 0 . 


d 

1 

2 

3 

CO 

0 

II 

0.088 

0.28 

0.5 

r]e=0 

0.031 

0.1 

0.17 


Table 1. Expected precision for the MSE when N = 1000 


We can observe that, when > 0, the precision increases with the dimension faster 
than when r/g. Moreover, as soon as < f/e/2 (d = 6 for our previous example), the 

result does not allow to conclude that decreases to 0 with this choice of rj^. 

Nonetheless, simulations (see next part) seem to show that this difficulty is only an 
artifice of our proof (we needed to introduce because we do not know how to compute 
IE((dn ~ (^*)Pn), but it does not really exist when we implement the algorithm). In 
practice, the optimal rate of convergence for optimal parameters is reached early (see 
Section 3). 


3. Numerical simulations 

In this part we present some numerical simulations to illustrate our results. We 
consider simplistic examples so as to be able to evaluate clearly the strengths ans the 
weaknesses of our algorithm. To begin with, we deal with dimension 1. We study two 
stochastic codes. 

3.1. Dimension 1- square function. The first example is the very regular code 

g{X,e)=X^ + e 

where X ~ ^([0,1]) and e ~ Z//([—0.5,0.5]). We try to estimate the quantile of level 
a = 0.95 for x = 0.5 and initialize our algorithm to 6i = 0.3. Let us check that our 
assumptions are fulhlled in this case. We have C{g{x, e)) = U{[—^ + | + x^j). Then 

Moreover, the code function g takes its values in the compact set [Ly, Uy] = [—§]. Let 
us study assumption Al. Let B be an interval containing x, denoted B = [x — a, x + b] 
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(a > 0, 6 > 0), then 


I-Fys (t) —-Fya; (t)| < 


r-^lBfix,Y)(^,y)dydz 


rt rx-\-o 

J—iJx—a 
< — - - 


fBfx{z)dz 

x-\-b 


^[-|+z 2;|+5;2] 1 [ i+zS. 1 


[ f{x,Y){x,y)dy 
J — OO 

{y)dzdy 


y{B) 


Now, we have to distinguish the cases in function of the localization of t. There are lots 
of cases, but computations are nearly the same. That is why we will develop only one 
case here. When t G [—— j]) we have 


|T'yB(t) — Tya:(t)| < 


rx^b rt 

Jx—a J--, 


4-|+22;i+^2] - 1[-1+22.1+,,2] 


(y) 


a + b 


fx-a ^^z>x(0) + lz<x(t 

a + b 

^ fx-ai*+l-^^)dz 
b + a 

There are again two different cases. Since t G — ^], we always have (t + ^) 2 < x. 

But the position of {t + 1/2)^/^ relative to (x — a) is not always the same. Then, if 
t G — ^(x — a)^], we get 


|T'ys(t) — Ty 2 !(t)| < 


-z^+i)dz 
b + a 

1, x^ (x — a)" 


<(* + 2 )“- + + ^ 

^ / \2 2,2 “ 
< (X — a) a — X a + a x —— 


2 

< —mx H- 


< 0 + rs X X - , 


as 0 < a < 1. Finally, in this case, A1 is true with M(x) = 2/3. We can compute exactly 
in the same way for the other cases and we always find an M{x) <2/3. The assumption 
A2 is also satisfied, taking Cinput = 1- We have already explained that assumption A3 is 
true for [Ly,f7y] = [—1/2, 3/2]. Finally assumption A4 is also satisfied with Cg{x) = 1 
and C' 2 (x, a) = 0.02. 


3.1.1. Almost sure convergence. Let us first deal with the almost sure convergence. We 
plot in Figure]^ for (/ 3 , 7 ) G [0,1]^, the relative error of the algorithm. Best parameters 
are clearly in the area /3 > 7 > 1/2. We can even observe that for /3 ~ 1, /? < 7 
or 7 < 1/2, the algorithm does not converge almost surely (or very slowly). This 
is in accordance with our theoretical results. Nevertheless, we can observe a kind of 
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continuity for 7 around 1/2 : in practice, the convergence becomes really slow only 
when 7 is significantly far away from 1 / 2 . 


relative error in function of Beta and Gamma, n=5000 

1.0 

0.8 

- 0.6 

- 0.4 

0.2 

□- 0.0 

0,2 0,4 0.6 0.8 1.0 



Beta 


Figure 1. Relative error for n = 5000 in fonction of /3 and 7 


3.1.2. Mean Square Error (MSE). Let us study the best choice of /3 et 7 in terms of 
L^-convergence. We plot in Figures]^ the mean square error in function of 7 and /3 (we 
estimate the MSE by a Monte Carlo method of 100 iterations). 


Mean square error, n=50 



0.2 0,4 0.6 0.8 1.0 



Beta 


Beta 


Figure 2. Mean square error in function of /? and 7 for the square function 

Simulations confirm that the theoretical optimal area 7 = 0.5 and = 7 + 7 ^ gives 
the smallest MSE. Nevertheless, it seems that in practice we can relax the condition that 
the gap rjg between f3 and 7 is as small as possible. Indeed, when rjjs is reasonably big, 
simulations show that we are still in the optimal area. 







CONDITIONAL QUANTILE SEQUENTIAL ESTIMATION FOR STOCHASTIC CODES 


13 


3.1.3. Theoretical bound. In this case, we have at hand all the parameters to compute 
the theoretical bound of our theorems. In particular, in corollary 2.2, we get 


an{x) < 


Cg{x, d, a) 
nrh-^ 


Table summarizes the value of the constants needed to compute the theoretical bound 
in this case. 


Constant 

a 

M[x) 

C^input 

C,[x) 

(72 (x, a) 

Uy — Ly 

Value 

0.95 

2 

3 

1 

1 

0.02 

2 

Constant 

VCi 

C3id) 

(74(d) 

C5ix,d) 

Ce{x,d) 

Cg{x, d, a) 

Value 

2.95 

7.39 

2 

1.95 

12 

180 


Table 2. Constant values 


For N = 1000, we obtain the bound a^ix) < 5.8 which is over-pessimistic compared 
to the practical results. We can then think to a way to improve this bound. First of 
all, the constant (72 (x, a) is in fact not so small. Indeed, we have to take a margin in 
the proof, for the case where 9n goes out of [Ly , Uy]. This happens only with a very 
small probability. If we do not take this case into account, we have C 2 {x, a) = 1. Then 
Cg{x,a,d) PS 3.7 and then, for N = 1000, the bound is 0.11. Practical results are still 
better (we can observe that for n = 50 only, we have a MSE inferior to 0.05 !), but the 
gap is less important. 

3.2. Dimension 1 - absolute value function. Let us see what happens when the 
function g is less smooth with respect to the first variable. We study the code 

giX,e) = \X\+e, 

where X ^ U [[—1,1]) and e~Z^([—0.5,0.5]). We want to study the conditional quantile 
in X = 0 (the point for which the differentiability fails). Assumptions can be checked as 
above. Since the almost surely convergence is true and gives really same kind of plots 
than the previous case, we only study the convergence of the MSE. In that purpose, we 
plot in Figure the MSE (estimated by 100 iterations of Monte Carlo simulations) in 
function of 7 and /3, for n=300 (the discontinuity constraints us to make more iterations 
to have a sufficient precision) and 9i = 0.3. Conclusions are the same than in the 
previous example concerning the best parameters. Nevertheless, we can observe that 
the lack of smoothness implies some strange behaviour around 7 = 1 . 

3.3. Dimensions 2 and 3. In dimension d, we showed that theoretical optimal param¬ 
eters are 7 = and fd = j + g. To see what happens in practice, we still plot Monte 
Carlo estimations (200 iterations) of the MSE in function of 7 and /3. 

3.3.1. Dimension 2. In dimension 2, we study two codes : 

gi{X, e) = ||A||2 + e and g 2 {X, e)=Xl + X 2 + e, 

where X = [Xi,X 2 ) ~ U ([—1,1]^) and 6 ^ ([—0.5, 0 . 5 ]). In each case, we choose 

n = 400 and want to study the quantile in the input point x = (0,0) and initialize our 
algorithm in 9i = 0.3. In Figure we can see that (3 = 1 and 7 = 1 are still really 
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Mean square error n=300 



0.2 0,4 0.6 0.8 1.0 


Beta 


Figure 3. MSE in function of /? and 7 for absolut value function 

bad parameters. As in our theoretical results, 7 = ^ seems to be the best choice. 

Nevertheless, even if it is clear that /3 < 7 is a bad choice, the experiments seems to 
show that best parameter /3 is strictly superior to 7 , more superior than in theoretical 
case, where we take /3 as close as possible of 7 . As we said before, in practice, Nq seems 
not to be the true limit rank. Indeed, with only n = 400 iterations, in this case, the 
MSE, in the optimal parameters case reaches 0.06. 


Mean square error, n=400, d=2, norm 



0.2 0.4 0.6 0.8 1,0 


Mean square error, n=400, d=2 



0.2 0.4 0.6 0.8 1,0 


Beta 


Beta 


Eigure 4. Mean square error in function of /3 and 7 

3.3.2. Dimension 3. In dimension 3, we study the two codes 

<71 e) = I All^ + f S'lid g 2 {X, e) = Xf + X 2 H — ^ + e , 

where A = (Ai,A 2 ,A 3 ) ~^/([-l,l]3) and U ([—0.5, 0 . 5 ]). In each case, we choose 
n = 500 and want to study the quantile in the input point (0, 0, 0). The interpretation 
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of Figurej^are the same than in dimension 2. The scale is not the same, the convergence 
is slower again but with n = 500 we nevertheless obtain a MSE of 0.10. 


Mean square error, n=500, d=3, norm 



Mean square error, nsSOO, d=3 



0.2 0.4 0.6 0.8 1.0 


Beta 


Beta 


Figure 5. Mean square error in function of /3 and 7 


4. Conclusion and perspectives 

In this paper, we proposed a sequential method for the estimation of a conditional 
quantile of the output of a stochastic code where inputs lie in W^. We introduced a 
combination of A:-nearest neighbours and Robins-Monro estimator. The algorithm thus 
elaborated had then two parameters to tune : the number of neighbours kn = and 
the learning rate 7 ^ = n~'^. Obtaining a bias-variance decomposition of the risk, we 
showed that our algorithm is convergent for ^ < 7 < /3 < 1 and we studied its mean 
square error non-asymptotic rate of convergence. Moreover, we proved that we have to 
choose 7 = and /3 = 7 + 7/3 {rjp > 0) to get the best rate of convergence. Numerical 
simulations have showed that our algorithm with theoretical optimal parameters is really 
powerful to estimate a conditional quantile, even in dimension d > 1 . 

The theoretical guarantees are shown under strong technical assumptions, but our 
algorithm is a general methodology to solve the problem. Relaxing the conditions will 
be the object of a future work. Moreover, the proof that we propose constrained us to use 
an artefact parameter e which implies that the non-asymptotic inequality is theoretically 
true for big n, even if simulations confirm that this problem does not exist in practice. 
A second perspective is then to find a better way to prove this inequality for smaller n. 

Finally, it is a very interesting future work to write non-asymptotic lower-bound for 
the mean square error of our algorithms. 

5. Appendix 1 : Technical lemmas and proofs 

5.1. Technical lemmas and notation. For sake of completeness, we start by recall 
and prove some well-known facts on order statistics. 

Lemma 5.1. When X has a density, denoting = P(X G kNNn+i{x)\Xi,... Xn), we 
have the following properties 
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1) Pn — P\\X-x\\ (11-^ x\\(k„+^,n)) 

2) Pn P{kn+l'i ^ ^n+l “t“ 1) 

3) E(PO = |^. 

/ll T^7'/'Z:)2^ 2fcn+l?^ —A:^_|_2^+3/Cn+l+^n+l^T’^ 

- (n+l)2(n+2) 

where we denote F\\x-x\\ the cumulative distribution function of the random vector 
I |X — x||, I \X — kn+i order statistic of the sample (||Xi — x||,..., ||X„ — x| |) 

and I3{kn+i,n — kn+i + 1) the beta distribution of parameters kn and n — kn+i + 1- 

Proof. Conditionally to Xi,..., Xn, the event {X G kNNn+i{x)} is equivalent to the 
event {\\X - x|| < ||X - Then, 

Pn = P(X G kNNn+l{x)\X^ ...Xn) 

= Ta (I |X - x| I < I |X - x| I |Xi ...Xn) 

=-^||A-x|| (11^ ~ 2:|. 

Since X has a density, the cumulative distribution function F\\x-x\\ is continuous. In¬ 
deed, using the sequential characterization we get for a sequence {tn) converging to t 

F\\X-x\\{tn) =IP(X G Bd{x,tn)) 

= / fiz)lBaix,t„)i^) ■ 

Since / is integrable, the Lebesgue theorem allows us to conclude that 

lim / f{z)lB^(x,t„){z) = [ limf{z)lB^(x,t„)iz) =^iX € Bdix,t)) , 

SO the cumulative distribution function is continuous. Then thanks to classical result on 
statistics order and quantile transform (see M), we get 


Pn = F\\X-x\\ {\\X - X\\(^kr,+un)) ~t^(fc„+i,n)~/3(fcn+l,ra-A:„+i + l) , 
where we denoted kn+i statistic order of a independent sample of size n 

distributed like a uniform law on [0,1]. □ 

Let us know recall some deviation results. 


Lemma 5.2. ITe denote B{n,p) the binomial distribution of parameters n and p, for 
n > 1 and p G [0,1]. Then, if Z ^ B{n,p), we get 





P > 2p^ < exp 



Proof. Let (Zj) be an independent sample of Bernoulli of parameter p and let 
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We apply the Bernstein’s inequality (see Theorem 8.2 of m) to conclude that 

¥{Z -p< -ep) < exp : 

P(Z-p > ep) < exp • 

The results follow by taking e = | in the first case and e = 1 in the second case. □ 

We now give some technical lemma useful to prove our main results. 

Lemma 5.3. Suppose /3 > 7 . Then, for C > 3, we get 



ekNNnix) < C 


= 0 . 


Proof. First, it is a well known result (see IH!) that if ~ U{[0, 1 ]), then W = F-^{U). 
Since F is non-decreasing, we get 


'^Un£kNNn{x) - lF-i(f/„)efcAUV„(E(a:)) «-S- 


So that, it is enough to show the result for X r\j Ui[0, 1 ]). 

Let X be a real number in [0,1]. Let e be a positive real number. Let no be an integer 
such that 


( 6 ) 


^ exp 

n>nQ 


3kn \ 

16 J 


< € . 


Let nf be the integer such that if x G {0,1}, nf = 1 and if x g] 0, 1[, for n > nf, 


kn 

kn 

X - 

2 n 


< 1 , 
> 0 . 


We denote N := max(no,nf). We set 


n : = 


Vn > N, 


n I 

7 = 1 J 


On this event, for every n > N, there are at most kn elements W such that \Xi — x\ is 
inferior to Thus, if an element satisfies \Xj — x\ < it belongs to the fen-nearest 
neighbours of x. Then, 
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(fi) s E •* E i|x, 

n>N \i=l 




(7) 


where, since n > nf, 


= 


= : Y,F{Z^>k 

n>N 

^E 


n>N 


B{n,p) ^ kn 
n n 


p = Fi\X -x\< ^ 
' 4n 


— —+ X < X < ^ + X ] ifx g]0, 1[ 
n ~ ~ 4n ' 


kn 


X < — ] if X = 0 
4n 


X <1- if x = l 


= 


4n 


^itxe]0,i| 


4n 


otherwise 


< ^ 
“ 2n 


Then, Equation Q gives 


(Si) < E »* E hx, 

n>N \j=l 


\Xj-x\<^ > 


( 8 ) 


< jp ^ ^ 2n) ^ ^ 

“In n 

f ^kn\ 


where we used the second inequality of Lemma 5.2 and the Equation Q. But, as we 
noticed above, on the event n, we have 


lx„efcA)V„(x) > 


Finally, 
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(9) 


X/ 7nlx„eA;AfAr„(a:) ^ ^ 1 - | X/ < C 


n>N j \ri>N 

Let now {l]^k be a partition of [|A^, +oo|] such that 

k 




VA:>1, G [2C,2C + 1] . 


nelk 


k 

Such a partition exists since, as 13 > j, the sum > 7 „— is divergent. Then, 

n 

<2C + 1 . 


Var 


< E 



n&h 


n&Ik 


The Chebyshev’s inequality gives 


since C > 3. 


Y1 ^^^\Xk-x\<^ < c < 


2C + 1 7 


\n&Ik 


C2 




n ] E 

k I ne4 


^ < C* 1 0 . 


( 10 ) 


E < c* I - 0 


i n>N 


Thanks to 0, (§ and p^ , we get 


E 'ynlx„&kNNn{x) < Cl I < + 0 < e , 

, n>N 


which holds for all e > 0. 


Lemma 5.4. Denoting An the event {Xi,... Xn \ Pn > Cn} where Cn = 
parameter e satisfies 1 > e > 1 — fi, we have for n > 1, 


F{A^) < exp ( - 


3(n + l) 
8 


l-e 


Proof. Thanks to the Lemma |5.1[ we obtain 

F{A^) = F{fi{kn+l,n - kn+l) > Cn) 
— (/Cj2-|-1, U kn+l) ) 


□ 

and the 
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where we denote /g the incomplete (5 function. A classical result (see m) allows us to 
write this quantity in terms of the binomial distribution 

¥{A^) =¥{B{n,en) > kn+i) • 

we know that 


Thanks to Lemma 5.2 


¥{B{n + 1, en) > K+i) < exp ( - 


3(n + l)en+i\ 


8 




< exp — 


3(n + 1)1-^ 


as soon as A:„+i/(n+l) > 2en, which is true as soon as n > 2^/*^*^ because e > 1 —/3. 

□ 


Lemma 5.5. 

to 0. 


Under hypothesis of Theorem 


2.1 


\X 


®ll(fc„+i,n) converges almost surely 


Proof. Let n be a positive number. 


( 11 ) 


Pu:=F{X gB{x,u))= [ 

Jb 


> nx {B{x,u)) = Cl 


B{x,u) 

d 

7r2 


f{t)dt 


r(i + i) 


— CinputC/l{d')U —. Qu . 


Let Z be a random variable of law B{n,pu). Since \\X — > u implies that 

there are at the most kn+i elements of the sample which satisfy X G B{x, Qu), we get : 

IPdl^ - a:||(fc„+i) >u)= F{Z < kn+i) . 

Thanks to equation 0, and denoting Z a random variable of law B{n, Qu), we have 


5.2 


^i\\X - a;||(fc„+i) > ^^) < F{Z < kn+i) . 
implies that IP(||A — > u) is the general term of a convergent sum. 

when n is large enough, then kn+i/n < qu/^, because kn+i/n converges to 0 
(/3 < 1). The Borel-Cantelli Lemma then implies that ||A — x|converges almost 
surely to 0. □ 


Lemma 

Indeed, 


Lemma 5.6. With the same notation as above, 


E(P„||X-rt||„.^„„,) < C,(d) . 

Proof. Let us denote F and / the cumulative and density distribution function of the 
law of IIA — x||. 

IE(||A - x\\(^k„+un)Pn) = IE (||A - x\\(^k„+un)P (ll^ “ 3^11 (fc„+i,n))) 

= / yPiy)f\\x-x\\^,^^^^„^iy)dy, 


with 
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n! 


_ !)!(„ _ 


Hyy 


’n + 1 


-1 


1-F{y 


n—h 


■n + l 


fiy) ■ 


Then we get 


E(||x - 


n! 


{kn+i - l)!(n - kn+i)l 


kn+l 
n + l 


®^(ll^-a^ll(fc„+l+l,n+l)) 


We denote C/|,| the upper bound of the support of ||X — x||, and write 


r^i-\ 

IE(||W-x||(fc„_^^+i,„+i)) = / F{\\X - x\\^k„+i+i,n+i) > u)du . 

Jo 


Using same arguments that in Lemma |2 t| denoting Cio(d) = ^ 


rCio{d) 

:= / F{\\X - x\\(^k„+i+i, n+l) > u)du = / P(S(n + 1, g„) < fc„+i + l)dit 

Jo Jo 


I ■= 


< 


- / P(;B(n + 1, g„) < k„+i + l)du 

J Cio{d) 

rCio{d) r + .| / 

Idrt + / exp I — 

I dCio{d) \ 


3(n + \^CinputC 4 .(^d^U 
32 


where we use Lemma 
Then, we obtain 


5.2 


in the second integral because u > Cio(d) implies 


fcn + l + I 

n+l 


^ 2 • 


du , 
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Jciiid) \ ‘62 J 

6{n + l)CinputC 4 :{d)u‘^ 


< Cio{d) + f 
Jo 

= Cioid) 


+00 ^d-1 


0 Cio(d) 

Cii{d) 32 


exp - 




32 


exp 


du 


6{n + l)C'i„putC'4(d)u“ 
32 


d \ 1 +°° 


(■ ( 1 I 3(n + l)dC'mpu4C'4(d) ^ 

- I ^ 


2(fc„4-i + 1) 


1 + 


16 


{n+l)CinputCA{d) \ 6d{k n+l + 1) 

dj kn+l 


< 


n + l 


dI ^n+1 


/ " 

/ kn+1 +1 / 

1 C'inputC'4{d) y 

^n+1 ^ 


1 + 


16 


6d{kn+i + 1) 


^ y ^input^Aid) 

-Csid) ^ 




d «'«+! 


n + 1 ’ 

because for n > 1, we get kn > 1- 


□ 


Lemma 5.7. Let {bn) be a a real sequence. If there exist sequences {cn)n>i £ [0,1]^ and 
{dn)n>i G]0,+(X)[^ such that 


Vfl + Hqi bnA-l ^ ftn(l C-n+l) C^n+1 j 


then for all n > Nq + 1, 


Vn, bn < exp [ - ^ iVo + Icfc I ^ATo + ^ exp j - Ep.-E 


k=l 


k=No+l 


- 2^'^ j 
vi=i i=i 


dk ■ 


Proof. This inequality appears in m and references therein. It can be proved by induc¬ 
tion using that Vx g]0, +oo[, exp(x) > 1 + x. □ 

Let us first prove the following consequence of Assumption A3. 

Lemma 5.8. Under assumption A3, if fJ then for all x and for all n > 1, 

0n{x) G [Ly — (1 — a), Uy + a], a.s. 


Proof. Suppose that 6n{x) leaves the compact set [Ly,Uy] by the right at step Nq. 
By dehnition, 9nq-i < Uy and consequently < Uy + oc'^Nq- At next step, since 
9No > Uy, we have Laq+i < 9^^^ and then 


^'Afo+l <Uy + a'jNo - (1 - «)7iVo+llX]V(,+ieA:AAjVo+i(x) ■ 
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Then, the algorithm either does not move (if ^ kNNisfQ^i{x)) or comes back in 

direction of [Ly, Uy] with a step of (1 — a!) 7 ArQ+i. Then, if 

'yn'i-X„&kNNn(x) = +00 a.S , 

n>0 

the algorithm almost surely comes back to the compact set [Ly, Uy]. Thanks to Lemma 
|5.3t we know that, since /3 > 7 , the previous sum diverges almost surely. A similar result 
holds when the algorithm leaves the compact set by the left and finally we have shown 
that almost surely, 

(^n{x) G [Ly — (1 — a), Uy + a] =: [Lq^, Ug^] . 

□ 


5.2. Proof of Theorem |2.1 : almost sure convergence. To prove this theorem, we 


adapt the classical analysis of the Robbins-Monro algorithm (see 0). In the sequel we 
do not write 9n{x) but 6^ to make the notation less cluttered. 


5.2.1. Martingale decomposition. In this sequel, we still denote H{9n, Xn+i,Yn+i) := 
(ly„+i< 0 „-a) ^x„+iekNNr,+i{x), = cr(Xi,..., Xn,Yi, ..., Yn) and and E„ the prob¬ 
ability and expectation conditionally to Tn- We introduce 


hn{9n) : = nH{9n,Xn+l,Yn+l)\Fn) 

= Fn{Yn+l < n Xn+1 G kNNn{x)) - aFn{Xn+l G kNNn{x)) 
= Pn [(ViViV„+,M(0n) - Fy49*)] . 


Then, 

n n 

Tn = 9n + ^^'Jjhj-l{9j-l) = 9o{x) — ) 

i=i i=i 

with = H{9j-i,Xj,Yj) — hj-i{9j-i) is a martingale. It is bounded in L^(]R). Since 

supl^nl < a -I- (1 -I- a) = 1 -I- 2a, 


the Burkholder inequality gives the existence of a constant C such that 


E(|r„|2) <E I I J]7i0 I )<CE 


i=i 


< 


C(l + 2a)^7' 


< 00 . 


i=i 


5.2.2. The sequence {9n) converges almost surely. First, let us prove that 


( 12 ) F{9n —)• 00 ) -|- F{9n —)■ — 00 ) = 0 . 

Let us suppose that this probability is positive (we name fli the non-negligeable set 
where 9n(io) diverges to -|-oo and the same arguments would show the result when the 
limit is — 00 ). Let uj be in fli. We have 9n(co) < 9* for only a hnite number of n. 
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Let US show that on an event Q C with positive measure, for n large enough, 
hn{9n{oj)) > 0. First, we know that Pn follows a Beta distribution. This is why 
Vn, P(Pn = 0) = 0. Then, the Borel-Cantelli Lemma gives that 

F{3N Vn > iV Pn > 0) = 1 . 

As rii has a positive measure, we know that there exists ^2 C with positive measure 
such that Vw G O 2 , 0n{^) +00 and for all n large enough, Pn{oj) > 0. Since 

hn{9n{^)) = Pn ~ ; 

we have now to show that on C ^2 of positive measure. 


a>0. 

As 0n(w) diverges to + 00 , we can find D such that for n large enough, dni^) > D > 9*. 
Then, 


F 


yB, 


‘n+1 


(x) 


{9n{u)) -a = f0n(a;)) - Py.(r) 

\X) 


yB, 


(0nM)-P, ^ 1P)+P 


^yS^+L.)' - yft 

+ Py.(P) -Py.(r) . 


{x) 


yB, 


^n + 1 


{□:) 


(P) -Fyx{D) 


First, P {9n{x)) — F (P) > 0 because a cumulative distribution function 

"yBu (x) (x) 

is non-decreasing. Then, we set r/ = Fyx(D) — Fyx{9*) which is a finite value. To deal 
with the last term, we use our assumption Al. 


P 


(P) -Py.(P) > -A/(X)||A-X||(fc„^,,n) ■ 


5.5 


that ||A — converges almost surely to 0. 


yfl„'‘+"(x) 

We know, thanks to Lemma 
Then, there exists a set C of probability strictly non-negative such that forall cu 
in ris, the previous reasoning is true. And for e < £, there exists rank N{oj) such that 
if n > A, 


( 13 ) ^yB^+b.) - Py.(P) > 0 - Pe + r? > 0 . 

Finally, for cj G ^3 (set of strictly non-negative measure), we have shown that after a 
certain rank, hn{9n{x)) > 0. This implies that on fla of positive measure, 


lim 

n 


n 

9n{x) + y~]7j-ihj-i(6*j_i(cu)) 


+00 , 


i=i 

which is absurd because in the previous part we proved that is almost surely conver¬ 
gent. Then 9n does not diverge to -|-oo or — 00 . 


Now, we will show that {9^) converges almost surely. In all the sequel of the proof, 
we reason cj by a; like in the previous part. To make the reading more easy, we do not 
write uo and fl any more. Thanks to Equation (12) and to the previous subsection, we 
know that, with probability positive, there exists a sequence (0„) such that 
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n 

(a) On + converges to a finite limit 

' i=i 

, (6) lim inf On < lim sup On ■ 

Let us suppose that lim sup > 0* (we will find a contradiction, the same argument 
would allow us to conclude in the other case). Let us choose c and d satisfying c > 0* 
and lim inf < c < d < lim sup 0^- Since the sequence {'jn) converges to 0, and since 
(Tn) is a Cauchy sequence, we can find a deterministic rank N and two integers n and 
m such that N < n < m implies 


(a) 


7n < 


jd-c) 
3(1 - a) 


(b) 


m—1 


j=n 


< 


d — c 
3 


We choose m and n so that 


(14) 


' {a) N < n < m 
< (b) On < c, 0m> d 

{c)n<j<m^c<0j<d. 


This is possible since beyond N, the distance between two iterations will be either 

a{d — c) 


ot'^n < 


3(1 - a) 


< (d — c) , 


because a < | or 

(1 - Q;)7n <^{d-c) < {d-c) . 

Moreover, since c and d are chosen to have an iteration inferior to c and an iteration 
superior to b, the algorithm will necessarily go through the segment [c, d]. We then take 
n and m the times of enter and exit of the segment. Now, 


d-c 

jj^ihj(Oj) 

j=n 

d — c 

^ ^ ~\~ ^n+lhn{Pn) j 

because n < j < m, we get 0* < c < 0j and we have already shown that in this case, 
hj{0j) > 0. We then only have to deal with On- If On > 0*, we can apply the same result 
and then 


On - On < 


d — c 
3 


which is in contradiction with (b) of equation (14). When 0 < 0*, 
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d — c 

^ ^ 1 “ 'ynh{6n—l) 

^ d — C , 

< —^ + 7n(l - a) 

d — c d — c 

< <{d-c) , 


which is still a contradiction with (b) of (14). We have shown that the algorithm 
converges almost surely. 


5.2.3. The algorithm converges almost surely to 6*. Again we reason by contradiction. 
Let us name 6 the limit such that P(0 ^6*) >0. With positive probability, we can find 
a sequel (On) which converges to 9 such that 


j (a) 0* < ei < €2 < oo 

I (6) ei < 6» < 62 , 

(or —oo < ei < 62 < 9* but arguments are the same in this case). Then, for n large 
enough, we get 


ei <9n <62 . 

Finally, on the one hand, (r„) and (0„) are convergent, and we also know that the 
sum converges almost surely. Let us then show that on the other hand, 

hn{9n) = Pn{F {9n) — ct) is lower bounded. First we know thanks to Lemma 

"Y Bn {^) 

that for 1 < 6 < 1 — /3 and Cn = > 


5.4 


< en) < exp - 


3(n + l) 


l-e 


This is the general term of a convergent sum. Therefore, the Borel-Cantelli Lemma gives 

F{3N yn> N Pn> Cn) = l . 


Moreover, as we have already seen in Equation (13), since 9n> 6i > 9*, 


- a > 0 - Mix)\\X - + Fy^ei) - Fy^d*) . 

Then, when n is large enough so that 

II w .,11 . Fy.{ei) - Fy.{ 9 *) 

11 ^ ^\\{kn+i,n) — 

holds, we have 


F 


ys;"+l (a:) 


(9n) - a> 


2M{x) 

Fy^ei) - Fy.{9*) 


Finally there exists a set 11 of positive probability such that, Vo; G H 

Fy.{ei) - Fy.{9*) ^ . A 1 


^ ^ 'yk+ihk{9k') 


> 


'"^Ik+lpk > ^ 


k=l 


k=l 


^ (n + 1)^-+^ ’ 
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which is a contradiction (with the one hand point) because the sum is divergent (7 + e < 

1 ). 


5.3. Proof of Theorem |2.2| : Non-asymptotic inequality on the mean square 
error. Let x be hxed in [0,1]. We want to find an upper-bound for the mean square 


error an{x) using Lemma [5^ In the sequel, we will need to study 9n(x) on the event 
An of the Lemma 5.4 Then, we begin to find a link between an(x) and the mean square 


error on this event. 


a„(x) = E (Onix) - 9*{x)f lAn 


(15) 


< E 

< E 

< E 


+ E 


{en-9*riAn +cinA^) 


{9n{x) - 9*{x)Y 


{9n{x) - 9*{x)Y 1 a„ 


{9n{x) -9*{x)flAC 
3(n + 1)1-" 


+ Cl exp - 


+ Cl exp - 


3n 


l-e 


8 


thanks to Lemma |5.4| and for n > Nq. 


Let us now study the sequence 6 „(x) := E {9n{x) — 9*)^ 

bn+i{x) < E [{9n+i{x) - 9*{x)f] . 


. First, for n > 0, 


But, 

( 6 »„+i(a;) - 9*{x)f = {9n{x) - 9*{x)f + 7^+1 [(1 - 2a)\Y„^-,<e^{,x) + lx„+iGfevv„+i(x) 

27n+l(^n(^) 9 (x)) (1 y'„^;,<0^(x) o) 'i-X.n+iGkNNn^-i{x) ■ 

Taking the expectation conditional to Tn, we get 

E„ (( 6 >„+i(x) - 9*{x)f) < E„ (^{9n{x) - 9*{x)f'j +7n+lEn {Xn+1 G kNNn+i{x)) 

- 27„+i ( 6 »„(x) - 9*{x)) [P„ (T „+1 < 9n(x) n X„+i e kNNn+iix)) 

X P„ {Xn+I e kNNn+lix)) FyY9*)] ■ 

Using the Bayes formula, we get 

E„ (0„+i(x) - 9*{x)f) < E„ ((0„(x) - r(x))') + 7 '+iPn 

- 27 „+i( 0 „(x)-r(x))P„ [F^^.„+,^^^( 0 „(x))-Fy.(r(x))] , 

Let us split the double product into two terms representing the two errors we made by 
iterating our algorithm. 

E„ (0„+i(x) - 9*{x)f) < (0„(x) - 9*{x)f + -fl+iPu+i 


(16) 


- 27 „+i (6»„(x) - 9*{x))Pn+l - PY-i9n{x)) 


- 27 „+i ( 0 „( x ) - 9*{x))Pn [FY^{9n{x)) - FyY9*{x))] ■ 
We now use our hypothesis. By Al, 
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- Py<^n{x))\ > M{X)\\X - , 


and by A3, 


Thus, 


\en{x)-e*{x)\ < v^. 


-27„+i(0„(a;) - 6 *{x))Pn - Fy.( 0 „(x))] < 27„+iv^M(a;)P„||A - 

On the other hand, thanks to A4 we know that, 

{dn - e*)[FY^{en{x)) - FY^{e*{x))]>C2{x,a)[en{x) - e*{x)\^ . 


Coming back to Equation (16), we get 

E„ (0„+i(x) - e*{x)f) < (0„(x) - e*{x)f {Iau + lAn) +7n+l^n 

- 27 „+i {9n{x) - 0*ix)f C 2 {x,a)Pn + 27 „+iM(x)\/^||A - x| 

To conclude, we take the expectation 

bn+i{x) < CiP(A^) + bn{x) - 27 „+iC' 2 (x,a)E {On{x) - 9*f 
+ 7n+lIE(T’„) + 27 n+iV^M(x)E [Pn\\X - x\\^kr,+i,n)] ■ 

But, by definition of An,we get 

- 27 „+iC' 2 (x,a)E Pn+i {9n{x) - e*f < - 7 n+ienC' 2 (x,a)E {6n{x) - 9*{x)f 1 a„ 

= -2'^n+PnC2{x,a)bn{x)-, . 

Finally, 

bn+i{x) < bn{x) (1 - 2C'2(x, a)7„+ie„) + e^+i , 

with 

Sn+l ■= C'lIP’(^n) T + 27 n+l (x)E [P„||A — x\\(^j^^^^^n')\ ■ 

Now using Lemmas |5.6| |5.4| and |5.1| we get for n > Nq with 

Cn < dn := Cl exp ^ + 2^/^M{x)C3{d)'yrl + 7n^ • 


The conclusion holds thanks to Lemma 5.7, for n > Nq + 1, 

(17) 

n 

bn{x) <exp{-2C2ix,a){Kn- HNo))bNo{x) + ^ exp(-2C'2(x,a) (k„ - Kfc))4 

fc=Afo+l 
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But thanks to Assumption A3, we have already shown that b^oix) < a^oix) < Ci. To 
conclude, we re-inject Equation © in Equation ( |15[ ) and obtain for n > Nq + 1, 

n 

an(a;) < exp (-2C2(x,a)(Kn - katJ) Cl-h ^ exp (-2C2(x, a) (k„ - «:*;)) 4 

fc=Aro+i 

+ Cl exp-— 


5.4. Proof of Corollary |2.1| : Rate of convergence. In this part, we will denote 


Tn-= Cl exp 


( -^-j , Tn := exp(-2C2(x,a)(Kn - katJ) 


and 


Tn-= ^ exp(- 2 C 2 (x,a) (k„ - Kfc))4 . 

fc=A^o+l 

We want to find a simpler expression for those terms to better see their order in n. First, 
considering we see that an{x) can converge to 0 only when the sum 


E 

fc>i 


A:T'+^ 


= -|-oo. 


This is why we must first consider e<l — 7 . Ase<l — /3, we have to take /3 > 7 . 


Remark 5.1. The frontier case e = 1 — 7 is possible but the analysis shows that it is 
a less interesting ehoice than e < 1 — 7 (there is a dependency in the value of C 2 {x,a) 
but the optimal rate is the same as the one in the case we study). In the sequel, we only 
consider e < 1 — 7 . 


Let us upper-bound T^. As x 1 —is decreasing, we get 


tI = exp j - 2 C 2 (x, a) ^ 


1 


< exp ( — 2 C 2 (x, ol) 


< exp ( — 2 C 2 (x, ol) 


^=^0 + 1 
j-n+l 2 


JNo+l 

{n + 1 ) 


t<L+l 


dt 


1 —e—7 _ 


{Nil + 1 ) 


1—e—7 


(1 - e - 7 ) 


Then, Tf (just like T^) is exponentially small when n grows up. To deal with the second 
term Tf) we first study the order in n of dn is composed of three terms : 


dn < Cl exp - 


(-^^) +2y/^iM{x)C3{d)n-'fNh-m+)i) ^ _ 


The first one is negligeable (exponentially decreasing). Let us compare the two others 
which are powers of n. Comparing their exponents, we get that there exists constants 
C 5 and Ce,{d) (their explicit form is given in the Appendix) such that 
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if /3 < 1 — d'y, then for n > Nq + 1 , 

dn < , 

if /3 > 1 — d'y, then for re > A^o + 1; 

dn < C6(x,d)re-^+(^+3)(^-i) . 

Remark 5.2. Let us detail how one can find C 5 (it is the same reasoning for Cq). If 
fi < 1 — d'y, we know that when re will he big enough, the dominating term ofdn will be the 
one in . Then, it is logical to search a constant C^{x, d) such that dn > No +1, 


, .C 5 {x,d) 

- ^27-/3+1 

Such a constant has to satisfy, for all n > Nq + 1, 


C 5 {x, d) > Cl exp 



2 ^iM{x)CM 

^-7+(l-/3)/d 


Since /3 < 1 — d'y, the map x i-A is positive and decreasing. Then its 

maximum is reached for re = Nq + 1. Moreover, the map x 1 —)• Ci exp 
is also positive and is decreasing on an +oo[. It also has a maximum. The previous 
inequality is then true for 


C^{x,d) := max Ci exp --re 

n>Ao+l y 8 

Let us study the two previous cases. 
Study of when (3 > 1 — d'y 


l-e 


2 ^iM{x)CM , . 

{No + l)-7+(l-/3)/<i 


To upper-bound these sums, we use arguments from [ 8 ], which studies the stochastic 
algorithm to estimate the median on an Hilbert space. The main arguments are com¬ 
parisons between sums and integrals. Indeed, for re > Nq + 2 and re > where A^s is 
such that 


re 


Vre > A^s, [-J > A^o + 1 , 


n—1 


Tf = Ce{x,d) exp - 2 C 2 (a;,a) Y 


k — Nq +1 

LtJ 


j=/c+l 


Ce{x,d) 

' n'''+d+ahi-/ 3 ) 


-P ■ 


= CQ{x,d) exp I — 2 C 2 (a:,a) —7 . / 

^ ) p+(i+3)d-/5) 


k — Nq +1 
n—1 


+ Ceix,d) Y, exp - 2 ( 72 (x, a) Y. 


1 


Ce{x,d) 


j=k+l 


Lf J +1 
=: 5'i -p 5'2 -P S '3 . 

First, the function x 1 —P x~^~'^ is decreasing on ]0, -poo[ then 


j^+'r I fc7+(l+3)(l-« „7+(1+3)(1-/3) 
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^^ \ 
S 2 <CQ{x,d) ^ exp (- 2 C' 2 (x,a) / -^dx) 

_I I I T \ k-\-l ^ J 

(n+ 


fc=LtJ+i 

= C'6(x,d)exp ( —2C'2(x,a) 


fc7+(l+3)(l-/5) 


1 — 7 — e 

n—1 / 

y~] exp ( —2C'2(x,a) 

fc=L=J+i ^ 


(A: + l) 


1—7—e 


1 —7 —e ) ^7+(i+^)(i-/3) 


Then, taking, 1 — j 3 < e < min((l — d'y), (l + ^) (1 — /?)), we have since A: > [|J + 1 


S2 < C'6(x, d) exp 


(h) 


n—1 




exp -2C2(x,a)—;- 

V 1 — 7 — e 

fc=LtJ+i ^ ' 


k'^+^ 


Now, since for A: > 1 , 


lY+7 / 2 

kj - 1^1 


we get 


S2 < Cg{x, d) exp -2C2 (x, a) 




1 — 7 — e 

n—1 




26+7 


i: exp(-2C.(x,a)7‘'''’" 

L^J +1 


1 


— 7 — e J (k 1)'^+^ 


Since the function x 1—)• exp 7C'2(x, a) j is decreasing on 

define the integer Ni{x,a) the rank such that 


I 00 

7 +e ’ 


, we also 


> / ^ N I ^ I ^ 2C'2(x, a) 

Vn > iVi x,a), [-J + 1 > -^- 

2 e + 7 


For n > Ni{x, a) we get 
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S 2 < Ce{x,d)exp -2C'2(a;, a) 


2i^+h)i^-d)+7 

1 — 7 — e 


n(i+3)(i-/5)-<^ 

1 

dx 


< 


fc=[f j+iUJ+^ 7 / / 

—--rexp -2(72(x,a)—- .,, 1^1 ^ 

^(72(a;,a) V 1 —7 —e / 7 ^(i+3)(i-p)-*^ 


1 — e — 7 


exp(^2(72(a;,a)^^5^^^^ - exp (^ 2 C 2 (x, a) ^^2 J + 2) 

Ceix,d) Cjjx^d^a) 1 

' 2(72(a;, a) 7j,(i+ 3)(1“/^)“<^ 2 7x-'=+(i+3)(i“-®) 

Let us now deal with the term ^i. As k < [ 5 J, we have 


l-£-7\-I 


< 


Then, 


n 


E 

j=k+l 


1 



1 

„e+7 


LtJ 


Si=Ce{x,d) exp —2(72(x,a) 


fc=-Vo + l 

ItJ 

< Cg{x, d) exp (—C 2 (x, a)n^~ 


j=k+l 


j^+T' I fc 7 +(l-/ 3 )(l+^) 


fc=l 


fc7+(i-^)(i+a) 


<C6(x,d)exp(-C2(x,ay ^ ^ ' 

Thanks to the exponential term, S'! is insignificant compared to S 2 whatever is the 
behaviour of the sum E 7 (1 /3)(i+tj)^ gQ ig Then, denoting N 2 {d,x) the 

k 

rank after which we have 


S, + S, + T^ + T^< 


Ct{x^ a, d) 


we get, in the case where /3 > 1 — 7 and 1 — /? < e < min((l — 7 ), (l + (1 — /3)), for 

n > max [Nq, Ni{x, a),N 2 {d, x)) 


an{x) < 


Cy{x, a, d) 
^-e+(l+i)(l-/3) 


Study of when /3 < 1 — d'y : 

Using the same arguments, we conclude that for 1 — /3 < e < min(l — /3 + 7 ,1 — 7 ) and 
n > max^No, Ni{x,a), N 2 {d,x)) (see Appendix for precise definitions of these ranks), 
there exists a constant Cs{x,a,d) such that the mean square error satisfies 
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Cs{x, a, d) 


5.5. Proof of Corollary |2.2 


^7-/3+l-e 

choice of best parameters (5 and 7 . Let us now 
optimize the rate of convergence obtained in previous theorem. When /3 > 7 and /3 < 
1 — d'y, the rate of convergence is of order . To optimize it, we have to choose 

e as small as possible. Then, we take e = 1 —+ The rate becomes Then, we 

have also to choose 7 as small as possible. In this area, there is only one point in which 
7 is the smallest, this is the point ( 7 ,/3) = Since we have to take /3 > 7 , the 

best couple of parameters, in this area, is + 7 / 3 ). These parameters follow a 

rate of convergence of 

When we are in the second area, the same kind of arguments allows us to conclude to 
the same optimal point with the same rate of convergence. 

In Figure we use the numerical simulations of Section 3 to illustrate the previous 
discussion. 


Mean square error, n=200 



- 0.4 


- 0.2 




0.0 


Figure 6 . Theoretical behaviour of the MSE in function of /3 and 7 


We have finally shown that 


an{x) < 


C^{x, a, d) 


ni+d 


-V 


where the constant is the minimal constant between C 7 {x,a,d) and Cs{x, a, d) computed 
with optimal parameters ( 7 ,/?,e). 
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6. Appendix 2 : Recap of the constants 


Let us sum up all the constants we need in this paper. 

6.1. Constants of the model. We denote : 

• M (x) the constant of continuity of the model, that is 

VR G Bx, Vf G M, |FyB(t) — FY=^t)\ < M{x)rB ■ 

• Cinput is the positive lower bound of the density of the inputs law fx- 

• Cg{x) is the positive lower bound of the density of the law of g{x, e). 

6.2. Compact support. We denote : 

• [Ly, Uy] the compact in which are included the values of g. 

• [Lx, Ux] the compact in which is included the support of the distribution of X. 

• [Le^, Ug^] := [Ly — (1 — a), Uy + a\ the segment in which On can take its values 
(Vx). 

• U\ \ the upper bound of the compact support of the distribution of ||A —x|| (Vx). 

6.3. Real constants. We denote ; 

• y/Ci := Uy + a — Ly. Cl is the uniform in uj and x bound of {On{x) — 6*{x))‘^. 

• 6*2 (x, a) := min ^C'g(x), ^ is the constant such that 

[Fyx (6'„(x)) - Fya, {0*{x))] [Onix) - 9*(x)] > C 2 {x,a) {9n{x) - 9*{x)f . 


Cs{d) ^ (^1 + ^ + . 


C4(d) := 


CUx d) ■= max C^ exo n 27 -/ 3 +i , ‘^V^M{x)C 3 {d) 

C 5 (x, d) . ^ Cl exp J n + ^ + 1- 

Cii{x,d):= max Ci exp n'^+(^+^^(^“^)+2v^M(x)C'3(d)H- ^ 

n>iVo+i \ 8 J (iVo + 

^optim ^ + - ^ -j- 

n>Ao + l V 8 J (iVo + + 

:= max Ci exp n(^^3)“3(T+a)“''/3(i+a)_|_2y/^Af(a;)C3(d) + 

n>iVo + l \ ® / 


/ . 1 I 1 I 1 I ' 

f]\[Q X) d^d(i+d)^i+d"^ d 

F7[x,a,a) C2(x,a) 

i \ 22a-/3+ic'5(x,d) 

C%{x,a) := - 


C' 9 (x, a, d) := min 


2^+3 d{l+d) 2TGd-’'/3+lc°^’*™(x,d) 

C2{x,a) ’ C2(x,a) 


c'wM y (s+iSSW- 
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6.4. Integer constants. We denote : 

• No := 2 

• Ni{x, a) is the rank such that n > Ni{x, a) implies 

n 2C2{x,a) 

2 € + 7 

• N 2 {x, a, d) is the integer such that Vn > N 2 {x,a,d), 

a) If /3 < 1 — d'y, 


Ss + s. + n + r's 




where ;= exp j — 2 C 2 {x, a) k | ;= Ci exp 

fc=Afo+l 


-3N 


LfJ 


'S's := and5i := C' 6 (x,(i)exp(- 2 C' 2 (x,a)n^ " '^)^k ^ 


b) li /3 > 1 — d'y 


k=l 


Q me I Ti + tO < 

03 + *^1 + 4n + 4n < 2^7-/3+l-e ’ 


where ;= exp | — 2 C 2 {x, a) k j , ;= Ci exp 

k=No-\-l 


-3n^ 


LfJ 

S 3 ■= ^ 2 ^-l+i) and 5i := Cs exp(-2C'2(x, a)ni-"-T') ^ 

fc=i 

• is the rank such that Vn > N^, > iVo + 1. 

• N 4 {x, a, d) ;= max (A^o + 2, Ni{x, a), N 2 {x, a, d), N^). 


./ 3 )(l+l/d) 
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